Introduction to Time-Series Modelling and Forecasting¶

What is a linear regression?¶

Simple linear regression¶

We observe the values $(x_1, x_2, x_3, ..., x_n)$ (predictors) and assume the following linear relationship between $X$ and $Y$: $$ Y = \beta_0 + \beta_1X + \epsilon$$

Where $\epsilon$ is an error term, $\epsilon \sim N(0,1)$

The simple linear regression model can be easily extend to a a multiple linear regression model where we have more than one predictor variable.

Multiple linear regression¶

There are $p$ predictor variables $X_1, X_2, \ldots, X_p$ and we assume the following model: $$Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \ldots + \beta_p X_p + \epsilon$$

Where $\epsilon \sim N(0,1)$.

Key assumptions:¶

  • Linearity: The relationship between $X$ and $Y$ is linear.
  • Homoscedasticity: The variance of residual is the same for any value of $X$.
  • Independence: The residual terms are independent of each other.
  • Normality: The residuals are normally distributed with mean 0.

How to asses the model fit?¶

Diagnostic plots¶

To check if the model fits well and the assumptions of normality and homoscedasticity hold it is helpful to look at the plot of residuals vs. fitted values. Let's look at the following examples where we assume the simple linear model $Y = \beta_0 + \beta_1X + \epsilon$

  1. $Y = 10 + 5X + \epsilon$
  2. $Y = 10 + 1.5X + 2X^2 + \epsilon$
  3. $Y = 10 + X + \epsilon \cdot X$
# Residuals vs. Fitted
x = np.linspace(0, 12, 50)
error = norm.rvs(size=len(x), scale=5)
y = [None] * 3
y[0] = 10 + 5 * x + 2 + error  # good example
y[1] = 10 + 1.5 * x + 2*x**2 + error  # quadratic relationship
y[2] = 10 + x + error * x  # heteroscedasticity

mods = [sm.OLS(y[i], sm.add_constant(x)).fit() for i in range(3)]

Good example: $Y = 10 + 5X + \epsilon$¶

Bad example - clear relationship: $Y = 10 + 1.5X + 2X^2 + \epsilon$¶

Bad example - heteroscedasticity: $Y = 10 + X + \epsilon \cdot X$¶

Model Fit statistics¶

Coefficient of determination R^2¶

$R^2$ is a statistic that will give some information about the goodness of fit of a model. It is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

An $R^2$ of 1 indicates that the regression predictions perfectly fit the data.

$$R^2 = \frac{\text{RegrSS}}{\text{TotalSS}}= 1 - \frac{\text{ResidSS}}{\text{TotalSS}}$$$$\text{TotalSS} = \sum_i(y_i - \bar{y})^2$$$$\text{ResidSS} = \sum_i(y_i - f_i)^2$$$$\text{RegrSS} = \sum_i(f_i - \bar{y})^2$$
for i in range(3):
    print("R-squared of model %s : %f" % (i + 1, mods[i].rsquared))
R-squared of model 1 : 0.885116
R-squared of model 2 : 0.930183
R-squared of model 3 : 0.010577

Regression meets Time Series¶

Suppose we observe the values $y_1, \ldots, y_t$ sequentially in time. In this scenerio this is the only data that we have, there are no other "predictor" variables. Can we say something about the value of $y_{t+1}$ given all the past observations $y_1, \ldots, y_n$?

Autoregressive model - AR(p)¶

The autoregressive model specifies that the output variable depends linearly on its own previous values:

$$ Y_t = \beta_0 + \beta_1Y_{t-1} + \beta_2Y_{t-2} + \ldots + \beta_pY_{t-p} + \epsilon_t$$

Where $\epsilon_t \overset{\text{i.i.d}}{\sim} N(0,1)$ is a white noise sequence.

R-squared: 0.615

Moving Average Model - MA(q)¶

The moving-average model specifies that the output variable depends linearly on the current and past values of the white noise terms:

$$Y_t = \mu + \theta_1\epsilon_{t-1} + \theta_2\epsilon_{t-2} + \ldots + \theta_q\epsilon_{t-q} + \epsilon_t$$

Where $\mu$ is the mean of the series and $\epsilon_t$ a white noise terms.

Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model. We do not observe the values of
$\epsilon_t$, so it is not really a regression in the usual sense.

# Example moving average series
N = 60
errors = norm.rvs(size=N, scale=2)
mu = 10
y1 = np.repeat(mu, N)  # MA(1)
y3 = np.repeat(mu, N)  # MA(3)
for t in range(2, N):
    y1[t] = mu + 1.5 * errors[t - 1] + errors[t]
    y3[t] = mu + 1 * errors[t - 3] + 1.5 * errors[t - 2] + \
            2 * errors[t - 1] + errors[t]

AR(p) and MA(q) Model Assumptions¶

When workting with $AR(p)$ and $MA(q)$ models we assume that the time series $Y_t$ is (weakly) stationary i.e.

  • The mean $\mathbb{E}(Y_t) = \mu$ is constant for all $t$
  • The variance $Var(Y_t)$ is the same for all $t$
  • The covariance $Var(Y_t, Y_{t-k}) = \rho_k$ is the same for each lag $k$

In practice, this means that no trend or seasonality is present in our time series. We will deal with those later. See below examples of stationary and non-stationary time series.

N = 150
errors = norm.rvs(size=N, scale=5)
mu = 1
y = np.zeros((N, 4))
for t in range(1, N):
    y[t][0] = mu + errors[t] # Stationary time series
    y[t][1] = t / 2 + errors[t] # Linear trend
    y[t][2] = mu + t * errors[t] # Increasing varianve
    y[t][3] = 15 * np.sin(t / 10) + errors[t] # Seasonality

How to recognise $AR(p)$ and $MA(q)$ processes ?¶

Autocorrelation ACF¶

$$\text{ACF}_k = Corr(Y_t, Y_{t-k})$$

$MA(p)$: Autocorrelations are zero for lag $k > p$

Parital Autocorrelation PACF¶

$$\text{PACF}_k = Corr(Y_t, Y_{t-k} \ | \ Y_{t-1}, \ldots, Y_{t-k+1})$$

$AR(p)$: Partialautocorrelations are zero for lag $k > p$

What to do if the series is not stationary?¶

Differencing - solution for removing trend and seasonality.¶

Suppose that $Y_t$ is a time series with a linear trend, i.e.:

$$Y_t = \mu + \alpha t + Z_t$$

Where $Z_t$ is a stationary process. Instead of modelling $Y_t$ we can look at $$\tilde{Y_t} = Y_t - Y_{t-1} = \alpha + Z_t - Z_{t-1}$$ Then $\tilde{Y_t}$ is a stationary time series.

To remove a quadratic trend we can differentiate the series twice

If a sesonal effect of length $m$ is present, e.g. 12-month cycle, weekly cycle etc. we should consider a series differenced at lag m $$\tilde{Y_t} = Y_t - Y_{t-m}$$

Note: The order of difference operations does not matter.

# Generate series
N = 365
errors = norm.rvs(size=N, scale=20)
y1 = np.arange(N) + errors  # linear trend
y2 = 0.005 * np.power(np.arange(N), 2) + errors  # quadratic trend
y3 = np.sin(np.arange(0, 365) % 30 / 10) * 80 + errors  # monthly cycle

# Apply diferencing
x1 = pd.Series(y1).diff()
x2 = pd.Series(y2).diff().diff()
x3 = pd.Series(y3).diff(30)

Time for the fusion¶

Autoregressive Integrated Moving Average Model¶

$AR + I + MA = ARIMA$¶

$ARIMA$ model captures all the aproaches presented so far:

  • AR: Autoregression. A model that uses the dependent relationship between an observation and some number of lagged observations.
  • I: Integrated. The use of differencing of raw observations to make the time series stationary.
  • MA: Moving Average. A model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

Each of these components are explicitly specified in the model as a parameter. A standard notation is used of $\text{ARIMA}(p,d,q)$.

  • $p$ is the order (number of time lags) of the autoregressive model
  • $d$ is the degree of differencing (the number of times the data have had past values subtracted)
  • $q$ is the order of the moving-average model.

    $$Y_t = \mu + \beta_1Y_{t-1} + \ldots + \beta_pY_{t-p} + \theta_1\epsilon_{t-1} + \ldots + \theta_q\epsilon_{t-q} + \epsilon_t$$

ARIMA step by step guide¶

  1. Plot the data, identify unusual observations
  2. If necerssary, transform the series to stabilise the variance
  3. Select the model:

    • If necessary difference the series to remove trend and seasonality
    • Plot the ACF/PACF to and try to determine the candidate models
    • Try the chosen models and compare fit statistics to choose the best one

      OR

    • Use automatic selection algorithms to find the best fitting ARIMA model

  4. Fit the model and check the residuals
  5. Calculate the forecasts

Step 1: Plot the data¶

df = pd.read_csv("data/airpassangers.csv", parse_dates=True, index_col=0)
df.columns = ["passangers"]
df["passangers"].plot()
plt.title("Airline Passangers")
plt.show()

Step 2: Transform the data to remove heteroscedasticity¶

np.log(df["passangers"]).plot()
plt.title("Log-transformed Airline Passangers")
plt.show()

Step 3: Model Selection¶

log_passangers = np.log(df["passangers"])
y = log_passangers.diff(12)[12:]
plot_acf_pacf(y)
y.plot()
plt.title("Log-transformed and differenced at lag 12 time series")
plt.show()
# Compare our selection with the auto_arima
auto_model = pmd.auto_arima(y, trace=False, seasonal=False)
auto_model
ARIMA(order=(2, 0, 0), scoring_args={}, suppress_warnings=True)

Step 4: Fit the model and perform model diagnostics¶

# Fit the selected model
model_fit = ARIMA(y, order=(2, 0, 0)).fit()
results = model_fit.get_prediction()
conf_int = results.conf_int(alpha=0.05).values
fitted = results.predicted_mean
model_fit.summary()
SARIMAX Results
Dep. Variable: passangers No. Observations: 132
Model: ARIMA(2, 0, 0) Log Likelihood 233.131
Date: Thu, 20 Jan 2022 AIC -458.262
Time: 19:58:23 BIC -446.731
Sample: 01-01-1950 HQIC -453.577
- 12-01-1960
Covariance Type: opg
coef std err z P>|z| [0.025 0.975]
const 0.1150 0.017 6.926 0.000 0.082 0.148
ar.L1 0.5540 0.074 7.515 0.000 0.410 0.698
ar.L2 0.2378 0.073 3.252 0.001 0.095 0.381
sigma2 0.0017 0.000 9.762 0.000 0.001 0.002
Ljung-Box (L1) (Q): 0.01 Jarque-Bera (JB): 7.78
Prob(Q): 0.94 Prob(JB): 0.02
Heteroskedasticity (H): 0.40 Skew: 0.33
Prob(H) (two-sided): 0.00 Kurtosis: 3.99


Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
model_fit.plot_diagnostics(figsize=(15, 8))
plt.show()

Step 5: Fitted values and forecasting¶

Time to Practice: Algorithmic Trading¶

Its now time to practice and test what we've learnt in time series modelling by developing a couple of algorithmic trading strategies! Our plan of action will be:

  1. Fetch, clean, and transform our financial time series data
  2. Use our ARIMA modelling skills to forecast stock returns
  3. Implement and backtest simple trading rules to analyse their performance

Fetching our data¶

# Retrieving stock price data 
stocks = ['GOOG', 'AAPL', 'FB', 'AMZN', 'GE', 'NFLX', 'JPM']
          
data = web.DataReader(stocks, 'yahoo', start='2020/11/10', end='2021/11/10')
data.head()
# Select Google closing data
prices = data['Adj Close']['GOOG']

fig, ax = plt.subplots()
ax.plot(prices)
ax.set_title("GOOG Stock Price")
ax.set_xlabel("Date")
ax.set_ylabel("Price ($)")
plt.show()
plot_acf_pacf(prices)
# Get stock returns
returns = prices.pct_change().copy()

fig, ax = plt.subplots()
ax.plot(returns)
ax.set_title("GOOG Stock Returns")
ax.set_xlabel("Date")
ax.set_ylabel("Change in Price (%)")
plt.show()
plot_acf_pacf(returns[1:])

What model do you think is suitable for this stock price?¶

# Compare with the auto_arima
auto_model = pmd.auto_arima(prices, trace=True, stepwise=False)

Our trading strategy¶

  • Start out with a simple strategy
    • Predict returns $\hat{r}_t$
    • Buy if prediction is above $r^{\text{buy}}$, sell if it is below $r^{\text{sell}}$
def arima_trading_backtest(prices, order, buy_ret=0.01, sell_ret=-0.01, train_size=0.3, verbose=True):
    # Preparing returns data
    p, d, q = order
    returns = prices.pct_change()[1:].copy()
    prices = prices[1:]

    # Train-test split and initial model training
    N = returns.shape[0]
    train = returns.head(round(N*train_size))
    test = returns.tail(N-round(N*train_size))
    model = ARIMA(train, order=(p,d-1,q)).fit()

    # Trading simulation
    holding = False
    balance = 100
    preds = []
    trades = []
    for t in test.index:
        i = prices.index.get_loc(t)
        # Forecast
        pred = model.forecast()[i]
        preds.append(pred)
        # Buying and selling conditions
        trade = False
        if (not holding) and (pred>buy_ret):
            trade = True
            holding = True
            trades.append(('buy', prices.index[i-1]))
            buy_price = prices.iloc[i-1]

        elif holding and (pred<sell_ret):
            trade = True
            holding = False
            sell_price = prices.iloc[i-1]
            profit = (sell_price - buy_price)/buy_price
            trades.append(('sell+' if profit>0 else 'sell-', prices.index[i-1]))
            balance += balance*(profit)

        # Verbose prints
        if trade and verbose:
            if not holding:  
                print(f'Sold @ ${sell_price}, {prices.index[i-1]}')
                print(f'Overall trade gain: {100*profit}%')
            else:   
                print(f'Bought @ ${buy_price}, {prices.index[i-1]}')
            print(f'Predicted return: {pred}')
            print(f'Actual return: {returns[t]}')
            print('')

        # Re-train model    
        model = ARIMA(returns[:t], order=(p,d-1,q)).fit() 

    # Present-value of portfolio if still holding     
    if holding:
        sell_price = prices[t]
        profit = (sell_price - buy_price)/buy_price
        trades.append(('sell+',t) if profit>0 else ('sell-',t))
        balance += balance*(profit)
        
    print(f'Overall Balance: ${balance}')
    return pd.Series(preds, index=test.index), trades

Trading with a simple ARIMA(1,1,0)¶

preds, trades = arima_trading_backtest(prices, order=(1,1,0))

Trading with an ARIMA(0,1,5)¶

preds, trades = arima_trading_backtest(prices, order=(0,1,5))
# Plotting prices
fig, ax = plt.subplots()
ax.plot(prices) 
for trade, t in trades:
    plt.axvline(x=t, color= 'g' if trade=='sell+' else 'r' if trade=='sell-' else 'k' , linestyle='--')

ax.set_title("GOOG Stock Price: ARIMA Algorithmic Trading Backtest")
ax.set_xlabel("Date")
ax.set_ylabel("Price ($)")
plt.show()
# Plotting prices
fig, ax = plt.subplots()
ax.plot(returns[1:])
ax.plot(preds)
for trade, t in trades:
    plt.axvline(x=t, color= 'g' if trade=='sell+' else 'r' if trade=='sell-' else 'k' , linestyle='--')

ax.set_title("GOOG Stock Returns: ARIMA Algorithmic Trading Backtest")
ax.set_xlabel("Date")
ax.set_ylabel("Price ($)")
plt.show()

Does our strategy generalize?¶

for ticker in stocks[1:]:
    prices = data['Adj Close'][ticker] 
    print(ticker)
    arima_trading_backtest(prices, order=(0,1,5), verbose=False)
    print('')

What else can we do?¶

  • Complicate the strategy
    • We know when to buy a stock, but how much should we buy?
    • Can we buy more?
    • Can we short?
  • Hyperparameter optimization
    • What are the best buy/sell thresholds?
    • Try out grid search
    • Feeling fancy? Give Bayesian optimization a go!
  • Asset classes
    • Which asset classes does the model work best for?
    • Can you use a different model on each class? What about each asset?
  • Confidence is key
    • Can you use other moment estimates to help in your trading?
  • Be careful when backtesting
    • Make sure to avoid data leakege and look-ahead bias
    • Careful with overcooking your model
    • Don't forget about transaction costs
    • Use common sense (sometimes the least common of all)